H-NND: A New Member of the In-Tree (IT) Clustering Family

نویسندگان

  • Teng Qiu
  • Yongjie Li
چکیده

previously in 2014, we proposed the Nearest Descent (ND) method, capable of generating an efficient Graph, called the in-tree (IT). Due to some beautiful and effective features, this IT structure proves well suited for data clustering. Although there exist some redundant edges in IT, they usually have salient features and thus it is not hard to remove them. Subsequently, in order to prevent the seemingly redundant edges from occurring, we proposed the Nearest Neighbor Descent (NND) by adding the “Neighborhood” constraint on ND. Consequently, unlike ND, the process of removing the redundant edges was no longer needed for NND. However, NND proved still not perfect, since it brought in a new yet worse problem, the “over-partitioning” problem. Now, in this paper, we propose a method, called the Hierarchical Nearest Neighbor Descent (H-NND), which overcomes the over-partitioning problem of NND via using the hierarchical strategy. Specifically, H-NND uses ND to effectively merge the over-segmented sub-graphs or clusters that NND produces. Like ND, H-NND also generates the IT structure, in which the redundant edges once again appear. This seemingly comes back to the situation that ND faces. However, compared with ND, the redundant edges in the IT structure generated by H-NND generally become more salient, thus being much easier and more reliable to be identified even by the simplest edge-removing method which takes the edge length as the only measure. In other words, the IT structure constructed by H-NND becomes more fitted for data clustering. We prove this on several clustering datasets of varying shapes, dimensions and attributes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of encrypted traffic for applications based on statistical features

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...

متن کامل

Application of soil properties, auxiliary parameters, and their combination for prediction of soil classes using decision tree model

Soil classification systems are very useful for a simple and fast summarization of soil properties. These systems indicate the method for data summarization and facilitate connections among researchers, engineers, and other users. One of the practical systems for soil classification is Soil Taxonomy (ST). As determining  soil classes for an  entire area is expensive, time-consuming, and almost ...

متن کامل

Simulated annealing and artificial immune system algorithms for cell formation with part family clustering

Cell formation problem (CFP) is one of the main problems in cellular manufacturing systems. Minimizing exceptional elements and voids is one of the common objectives in the CFP. The purpose of the present study is to propose a new model for cellular manufacturing systems to group parts and machines in dedicated cells using a part-machine incidence matrix to minimize the voids. After identifying...

متن کامل

استفاده از الگوریتم های داده کاوی در بررسی عوامل موثر بر پیش بینی وضعیت بدو تولد نوزادان

Background & Objective: Prediction of health status in newborns and also identification of its affecting factors is of the utmost importance. There are different ways of prediction. In this study, effective models and patterns have been studied using decision tree algorithm. Method: This study was conducted on 1,668 childbirths in three hospitals of Shohada, Omidi and Mehr in city of Behshahr...

متن کامل

A Hybrid Clustering Criterion for R*-Tree on Business Data

It is well-known that multidimensional indices are efficient to improve the query performance on relational data. As one successful multi-dimensional index structure, R*-tree, a famous member of the R-tree family, is very popular. The clustering pattern of the objects (i.e., tuples in relational tables) among R*-tree leaf nodes is one of the deceive factors on performance of range queries, a po...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1509.02805  شماره 

صفحات  -

تاریخ انتشار 2015